非平行的多域语音转换方法(例如Stargan-VC)在许多情况下已被广泛应用。但是,这些模型的培训通常由于其复杂的对抗网络体系结构而构成挑战。为了解决这个问题,在这项工作中,我们利用最先进的对比学习技术,并将有效的暹罗网络结构纳入Stargan歧视者。我们的方法称为Simsiam-Stargan-VC,它提高了训练稳定性,并有效地防止了训练过程中的歧视者过度拟合问题。我们对语音转换挑战(VCC 2018)数据集进行了实验,并进行了用户研究,以验证我们的框架性能。我们的实验结果表明,Simsiam-Stargan-VC在客观和主观指标方面显着优于现有的Stargan-VC方法。
translated by 谷歌翻译
通过恢复(实体瘤的响应评估标准)自动测量病变/肿瘤大小,直径和分割对于计算机辅助诊断很重要。尽管近年来已经研究了它,但仍有空间可以提高其准确性和鲁棒性,例如(1)通过合并丰富的上下文信息来增强功能,同时保持高空间分辨率,(2)涉及新任务和损失以进行关节优化。为了实现这一目标,本文提出了一个基于变压器的网络(Meaformer,测量变压器),用于病变恢复直径预测和分割(LRDPS)。它被配制为三个相关和互补任务:病变分割,热图预测和关键点回归。据我们所知,这是首次使用按键重点回归进行恢复直径预测。 MeaeFormer可以通过使用变压器来捕获其远程依赖性来增强高分辨率功能。引入了两个一致性损失,以明确建立这些任务之间的关系,以更好地优化。实验表明,MeAformer实现了LRDP在大规模深层数据集上的最新性能,并在纵向研究中产生了两个下游诊所的任务,即3D病变细分和恢复评估。
translated by 谷歌翻译
细颗粒实体打字(FET)旨在推断本文中提及的特定语义类型。 FET的现代方法主要集中于学习某种类型的外观。很少有作品直接建模类型差异,也就是说,让模型知道一种类型与其他类型不同的程度。为了减轻这个问题,我们提出了一种富含类型的FET的分层对比策略。我们的方法可以直接建模层次类型之间的差异,并提高区分多元类似类型的能力。一方面,我们将类型嵌入到实体上下文中,以使类型的信息直接感知。另一方面,我们在层次结构上设计了一个约束的对比策略,以直接建模类型差异,这可以同时感知不同粒度下类型之间的区分性。 BBN,Ontonotes和Figer的三个基准测试的实验结果表明,我们的方法通过有效建模类型差异在FET上实现了显着性能。
translated by 谷歌翻译
非平行的多与众不同的语音转换仍然是一项有趣但具有挑战性的语音处理任务。最近,基于有条件的自动编码器的方法AutoVC通过使用信息限制的瓶颈来删除说话者身份和语音内容,从而实现了出色的转换结果。但是,由于纯粹的自动编码器训练方法,很难评估内容和说话者身份的分离效果。在本文中,一个新颖的语音转换框架,名为$ \ boldsymbol t $ ext $ \ boldsymbol g $ uided $ \ boldsymbol a $ utovc(tgavc),提议更有效地将内容和音色与语音分开,其中预期的内容嵌入其中根据文本转录生产的旨在指导语音内容的提取。此外,对对抗性训练将用于消除从语音中提取的估计内容中的说话者身份信息。在预期内容嵌入和对抗培训的指导下,对内容编码器进行了培训,以从语音中提取嵌入说话者的内容。 Aishell-3数据集的实验表明,所提出的模型在自然性和转换语音的相似性方面优于AUTOVC。
translated by 谷歌翻译
尽管深度神经网络(DNNS)在音频分类任务中取得了巨大的成功,但它们的不确定性校准仍未得到探索。当它确定其预测时,应进行良好的模型应准确,并表明何时可能不准确。在这项工作中,我们研究了深度音频分类器的不确定性校准。特别是,我们从经验上研究了流行校准方法的性能:(i)蒙特卡洛辍学方法,(ii)集合,(iii)局灶性损失和(iv)光谱范围差异高斯工艺(SNGP),在音频分类数据集上。为此,我们评估了(I-IV),以应对环境声音和音乐流派分类的任务。结果表明,未校准的深度音频分类器可能过于自信,并且SNGP在本文的两个数据集中表现最好,并且非常有效。
translated by 谷歌翻译
最近,大型预磨损模型(例如,BERT,STYLEGAN,CLIP)在其域内的各种下游任务中表现出很好的知识转移和泛化能力。在这些努力的启发中,在本文中,我们提出了一个统一模型,用于开放域图像编辑,重点是开放式域图像的颜色和音调调整,同时保持原始内容和结构。我们的模型了解许多现有照片编辑软件中使用的操作空间(例如,对比度,亮度,颜色曲线)更具语义,直观,易于操作的统一编辑空间。我们的模型属于图像到图像转换框架,由图像编码器和解码器组成,并且在图像之前和图像的成对上培训以产生多模式输出。我们认为,通过将图像对反馈到学习编辑空间的潜在代码中,我们的模型可以利用各种下游编辑任务,例如语言引导图像编辑,个性化编辑,编辑式聚类,检索等。我们广泛地研究实验中编辑空间的独特属性,并在上述任务上展示了卓越的性能。
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
Theoretical properties of bilevel problems are well studied when the lower-level problem is strongly convex. In this work, we focus on bilevel optimization problems without the strong-convexity assumption. In these cases, we first show that the common local optimality measures such as KKT condition or regularization can lead to undesired consequences. Then, we aim to identify the mildest conditions that make bilevel problems tractable. We identify two classes of growth conditions on the lower-level objective that leads to continuity. Under these assumptions, we show that the local optimality of the bilevel problem can be defined via the Goldstein stationarity condition of the hyper-objective. We then propose the Inexact Gradient-Free Method (IGFM) to solve the bilevel problem, using an approximate zeroth order oracle that is of independent interest. Our non-asymptotic analysis demonstrates that the proposed method can find a $(\delta, \varepsilon)$ Goldstein stationary point for bilevel problems with a zeroth order oracle complexity that is polynomial in $d, 1/\delta$ and $1/\varepsilon$.
translated by 谷歌翻译
With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.
translated by 谷歌翻译
Stance detection refers to the task of extracting the standpoint (Favor, Against or Neither) towards a target in given texts. Such research gains increasing attention with the proliferation of social media contents. The conventional framework of handling stance detection is converting it into text classification tasks. Deep learning models have already replaced rule-based models and traditional machine learning models in solving such problems. Current deep neural networks are facing two main challenges which are insufficient labeled data and information in social media posts and the unexplainable nature of deep learning models. A new pre-trained language model chatGPT was launched on Nov 30, 2022. For the stance detection tasks, our experiments show that ChatGPT can achieve SOTA or similar performance for commonly used datasets including SemEval-2016 and P-Stance. At the same time, ChatGPT can provide explanation for its own prediction, which is beyond the capability of any existing model. The explanations for the cases it cannot provide classification results are especially useful. ChatGPT has the potential to be the best AI model for stance detection tasks in NLP, or at least change the research paradigm of this field. ChatGPT also opens up the possibility of building explanatory AI for stance detection.
translated by 谷歌翻译